Bayesian Models for Frequent Terms in Text
نویسندگان
چکیده
In this paper we present statistical models for text which treat words with higher frequencies of occurrence in a sensible manner, and perform better than widely used models based on the multinomial distribution on a wide range of classification tasks, with two or more classes. Our models are based on the Poisson and Negative-Binomial distributions, which keep desirable properties of simplicity and analytic tractability.
منابع مشابه
Using Fuzzy LR Numbers in Bayesian Text Classifier for Classifying Persian Text Documents
Text Classification is an important research field in information retrieval and text mining. The main task in text classification is to assign text documents in predefined categories based on documents’ contents and labeled-training samples. Since word detection is a difficult and time consuming task in Persian language, Bayesian text classifier is an appropriate approach to deal with different...
متن کاملUsing Fuzzy LR Numbers in Bayesian Text Classifier for Classifying Persian Text Documents
Text Classification is an important research field in information retrieval and text mining. The main task in text classification is to assign text documents in predefined categories based on documents’ contents and labeled-training samples. Since word detection is a difficult and time consuming task in Persian language, Bayesian text classifier is an appropriate approach to deal with different...
متن کاملBayesian Methods for Frequent Terms in Text: Models of Contagion and the ∆ Statistic
Most statistical approaches to modeling text implicitly assume that informative words are rare. This assumption is usually appropriate for topical retrieval and classification tasks; however, in non-topical classification and soft-clustering problems where classes and latent variables relate to sentiment or author, informative words can be frequent. In this paper we present a comprehensive set ...
متن کاملBayesian Melding of Deterministic Models and Kriging for Analysis of Spatially Dependent Data
The link between geographic information systems and decision making approach own the invention and development of spatial data melding method. These methods combine different data sets, to achieve better results. In this paper, the Bayesian melding method for combining the measurements and outputs of deterministic models and kriging are considered. Then the ozone data in Tehran city are analyze...
متن کاملThe Family of Scale-Mixture of Skew-Normal Distributions and Its Application in Bayesian Nonlinear Regression Models
In previous studies on fitting non-linear regression models with the symmetric structure the normality is usually assumed in the analysis of data. This choice may be inappropriate when the distribution of residual terms is asymmetric. Recently, the family of scale-mixture of skew-normal distributions is the main concern of many researchers. This family includes several skewed and heavy-tailed d...
متن کامل